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The paoi^r rerievs preyloas research studies asd 
coaferexices vhich h^^-eX^alt vith the gaestion of nhezhez large-scale 
testing program? ^fe ef f^njj,?^. It is ccsclnded thaz stsch programs, 
defined as effects to determine rhe status of stadeat ^.chiTeTemeat oa 
a school, dast^^t, state, or aatioaal basis, are act serriag the 
iaformaticakVaeeds of ^he decision- maiing bodies for »hom they are 
desigae^ iScee schools of xiooght are discussed coaceraiiig reasoas 
why iar^--^6al^ t-esrisg progx-^ms ax^ ^t. adeg.uatglf respoasiTe, These 
iacluded those vho be lie re that po3J.cymaJ:ers do aot vish xo make 
data- based/ dec isioas; those ¥ho believe the fault lies »ith 
inef fecrire dissemiaatioa aad atilizatioa subsystems; and those who 
cbalieage /the suitability of large-scale testing programs, as 
currently/ operated, for serriag the realities of educational 
policyxaj£iag. Ifter discussing the nature of educational 
^li<:«aking^ the paper^sug^ests three reasons vk'yrtesting and 
asjsesi^^at programs have failed to mate tne desare3 impact. Tbese 
include: (1) such programs have not adequately defined ^the le^el at 
«rhich their garget audiences aiie most likely to make policy; (2) such 
, programs* selacm have, the capacity to produce information which is 
•issue" orifented at a time vhem it is most needed by policymakers; 



and (3) ^ev programs tak$ into account that the policymak i ng process 
is characterized by '? uncertainty" atad by "competing ralue sxstems^" 
{iuthor/2C) 
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At past aeetings of rite fc-mgry vr jrr. Sfuca^imal Beseaxci Association, a rszi^ty^ ^ 
cf^ papers sjisposLa. have dealt vlth terinical Is^&s related tx) tins iizprcnre-^ 
^zasat of large-scale testing prDgrsiss — tiose prograirs vhlf h ve are definlr^ as 
efforrs to deterrdLse the status of stodpnt aciiefereat 02 a scbDol, discrict, 
state or naticoal basis- "Re Vnryg tliat saci programs are 1''ktrf^<;}'7g ia rrrrmVer 
arid cost (Hasrlc?, 1975). Ibeir fnrfsteace is conncnij foood as a prerequisite 
±a tiie rhepq;ric for a^pcoantabiiitj in psilic edacatioa. 

I ■ - ■ 

It appeared to fasany cf os, bovever, that preseataticots durir^g toe -Last tea? 
years hs^B increasisiglj ended by asVlrtg two ba s ic questioas: 



y/ Are sa£h p rrf^ > g r ^ ?g acmally prodsciag data vidch serves the . 

, iaf ortsatiorsal needs of t2ie decjslcswTprVjTTg bodies for -sphom 

the^ are designed? Or to pat it^aotber «3y, is there -any 
• - evldgace tsat decisioas, -parrtcalarly of the policy nature, 
are being stinalated or eabaaced by the data beirrg prodoced 
by sacb prograics? 

• if such progrsiLS are sot bavins the desit'ed ixcpact, vnj 

not'? Khat can be done to icjFTOve this sitnatioa?' 

Tflifi sy ni p osixsL was organized specifically to focus on these two questions. 

la answering the first question. It is 27 belieJ rhat there is^nov sufficient 
espiric^ evidence and a sufficient cczfber of public declarations of subjective 



evaluations by both producers and consnsters of large-scale testing data to 

adisit thaf such pr n grt^r*** axe not aeeting tHe decisiorj-caking needs of their 
• * 

targSt acdicnces. Let ae quickly review sorse of t^is -evidence* 




Last year I pre&eared the re3alt3 cf a n^r1osyl£e &3xvej of 
stixjt ^sessBszuz jir r ^^r^^i^^i CHall, 1975)- One of laajpr fird- 
lags of tils stxdy «ss rbst^ vlii tie excspviQ^'of settling 
prlcxictes &r tie^'allccatloa of fxrads tinder ^Zirle III^ the 
Elesaesrary <rv^ Seccodary Sdacatira Act, less rhsa a t25drd of the 
states ccmid *'sbDw evidence of t^sizg j±ieir assessment resait3 to 
isake t±ie types of decisions vhlch are fregcestiy cifed as a 
justification for the initiation of statesrfde assessc^ot pro- 
grams'' (?• ii). 

In the evalnatiod of the assessment yyo grrrt of a state that has 
provi ded narioral leadershl.? in the so-called ^ac ooiiar - a^i 11 t y " 
laoveiaent, the authors declared that one of their sost snrpris- 
ing f indlzigs vas that the staterfde assessiaenr progxaos had 
"little apparent val'se to any najcr group " (Smse^ 2ivers & 
Scoff lei>C2tt, 1974, p. 

Sinilar cocplainrs ahcsjt. the laci of titility of stateside assess- 
pent pro-graias, .at least in the policy arena, vere wiced at the 
1$75 l^tional Foxuui for the Advaaceacat of State Sdacational 
Assessment Programs and the 1974 conference of - the Sational > 
Association of -State Boards of Zdocation. The harshest comnfeits 
vere made by state legislators (or their staffs), by chief state 
school officers or by members of state educational hoards. 
Eemenber, these are the grocps frequently cited as target ^od^ 
esces for state assessment data* 

In 1976, the Oregon D ep&rtme at of Sdocatioa conducted an inven- 
tory and analysis of standardized norm-reference tescs adminis- 
tered in the state on a district- or school-^wide basis <deJtacg, 
1973) • One portion of the study dealt vith the use made by the 
schools of their test data* Eearly all r^>orred that the target 
^ndiences £or suqh test results '^rere teachers and that such 
tests vere used either for cXassrooa plaiming or i<yc assigning 
student grades » ?ewer than five percent reported that the data 
vas us^ for any policy purposes at t5ie bui2dtng or district 
level. Yet, at the D^artment of Education, we are dally 
receiving oommanications from teachers about-a nev Kjirlmrrm 
School Standard vhich requires that districts initiate program • 
assessment activiti^nr These teachers report that the current 
tests being used by their districts or buildings are "useless,** 
vhy mandate furlier activity inr this area. 

Bleecher C1975) recently reported the results of a study in 
Mchigan focusing oa teacher attitpdes towards accountability * 
and assessment efforts at either the st&ix, or local levels Bis 
te.acher reipondents r^orted, among other findings, that ^sess- 
aent inforratlOT produced from vhaf^ever source^ (state or local) 
and transmitted to teachers resulted in (a) recommedBtlons vith 
vhlch they felt thqr could not physically or mental If cosily. 



-s^^^ C!) inf onuatioQ ^dtlri vas irrelevaat zd their personal 
inrcrests* Given tiese rco firrflTrgs^ daae Trigbr assame tha^ 
tbese assessment pTt>grs3ns ^bad 31 tele Inexact on policjueiing at 
t2ie classrooTD level— -at least to the errgnr snch policies ggre 
cbosea or enacted'^ teacSiers* ' 



Ibese are &oae of the isdicatcrs ghlch have led lae pcrscnaUj to the conclcsiffii 

lar^--scale testing prograas, as preseSitJ^ contacted, barvse Trlnf rgrl or ao 
iEpact oa tie f^^^r^ f^^'*^ policjuatlrag aadleaoes^,^^ «5iani ticj are sc^posedly 
destgsed^ ^ZSiej are sustairsed o:alj by a il£e--S3pport systea vhirh is caxmected' 
to r be terraoas cord of federal aiid state reqnireiDeats or a bost of good Intea- 
tlons* I elBO believe that; we nzust either iii^^rove this sitoatixni or that oae 
of tbese dejs badget-conscicp^ poLLcynsaiers are g^ing to cie the data to ceke a 
decislKi tbat I am certain acM of as are after»^ T&ey'^are goisg to decide tiat 
the body of Large-scale tes^iitg is ao loager breathing ai^d pall the pltig* 

Ibere are #t least three najor scbools of tboa^rt as to «iy large-scale testing 
prograzzs are aot adequately responsive to educational policynakers at the 
catioaal, ^tate or local level. 

The first school of tboogjxt says the ptoblea rests with the policynaiers thea- 
selves. They eithex; do not want ta naie any decisions or they do aot vant to 
raVo say data*based decisioas* Those of you who advocate this position vill 
get a certain aiSount of argmaent from policy scholars ia othfer fields* 
Liadblooa (1968) for exacple qjjptes a foracr director of the federal budgeting 
systea'vho admaished; 



The cynical view of the xatter is that rational calcalatioa la govena- 
r^o TTt prograasaiag is a hansless but ineffectual p if rs i Tl t, since all ^ 



l-mpcrrt^rnt questipTLS arc altiiaatelj decided cs *V>o33tlcga*' grouais. . 
• .Eie tiesis is vroag if ±t. is tajiea to laean tihe flTrfiTtg^ of ^kil2^ 
-^id pbject±re ^maljrsis of piolic progrsns arc ri^t inflr^frrrlial In 
decisico-3S2kis5 ar tiie hlg^iest level, 5a fatrt^'epch fi^ir^gs are 
-srs s ilj i a f2-g g g r??iX asd, rr>t iafreqrsently, decisive (p.Tl)-' 



There is also eridesce from a recent study of thB Tn^vtlrTrrp fcr Social Beseirci 

at the Daivcrsi^ of ^jchjgaa (1975) that fs^level federal policjiaakers are- 

^ • ♦ • 

• • • 

looking fcr help f root social scientists and are ugj-n.g the resriits of social 
science researci to shape pojlcj decisioss. Ls quoted ic Sdsearton Hall 7 : 



• . ♦Oar data suggest that .government erecdtives do ^t need to be 
sold o«2 the potential usefulness of scientific iaf crnation, nor do 
thej a reasoned appreciation of its valoe in Tioldlng Icportant 

decisi^ 5); ' . • ' - ^ 



A second school of thongfht cites the inadeqtzacies of the dissemination and 
dtiiiaation sahsystenrs practices of large-Kale testing efforts as the iia|txr 
reason why siich prograazs do no^ adegtxately serve or inf la^ence their target 
«jdtences. 

a third viewpoint chall engp:^ the saitability of largfe-scale testing pro- 
grass, at. least as carrcntly operated, for serving the realities o^^docational 
policyiaaking,. It is to this viewpoint that I will concentrate the remainder of 
recarlcs. 



Definitions about what we zaean fay the edocatlonal "policyisaldng process" are 
legion* f h^en to like the definition used by Mann (1975) that educational ^ 
poUcycaking is, a process characterized by deliberations on problems , wMch , • • 



^are^ p-^ i^^ in aat^ore. . ^sre very coasequeat g.al > * coonplfix* • dojid- 

22Led bj iricertaiiii.y* . .reflect szd are affected by disagyfigmfnt airoz^r tihe 
goals to i>e pars^ued** (p; , «^ 



, Givea thi3 defiaitiaa, let lae offer three reasoas ^iiy I feel onr cturent 
c55>rt>adi to large-scale voting programs are not sait^ tx> the, realities of 



educarimal pollcjisaJLlrig proce&s. In so doing, I nope to ^tlmalate soiae thorrg^rs 
CO. joccr part aboat specific st^s yon can taVe to lii5>ro7e injtact of large- 
scale testing prograias (and I niigbt add, educg^fay^ rese&rcb asd e v^l r .r t^ga — 
findings as veil). 



(l^ Such programs faave not adesi^ately deflrrpd tne leyel at vnich their tarzez 
audiences are noat i^VoTy to -nata* polity ^ 



Boat directors of testing can^ell yoa xkjo they fhfnlc their target 'andiences 
are; i^e*, state legislators, local school board rrenfbezrs, state or_ district- 
school administrators, classrooa teachers and so forth. 3ttt that is as far as 
their analysis h ^^ generally proceeded. They have not accotxated for the fact 
that any one of these target audiences are likely to nake policy at a variety 
of levels and that seldon will the saiae jiata serve tjiese variety of needs. Ihe 
literature on p>oiix:y analysis or policygaTcIng is fnll,of definitions about ' 
different levels of policyisaking. Simon (1960) cafes tiie dis t in ct ion betveen 
"organirationaJ,.£2ii^> adMnistr^tive* policy and operational policy" jCp. 5). 
Others speak to distinctions betwem '^ssacro-societal" policy issues where 
Baltipie institutions within a'^Xocele bear responsibility for solntioits and 
"organisational" policy issues where a single Institution is faced vith the 



isior decisirariaUz^g resportsiiiiiry Qfem, 1975)- 5ct eranple, cii t±ie oae hspd 

a -stare board of education may be c£>3cenied wirh po l i cies ml^lf'^relate Xx> bcw 

the readl^ progr^S in scbcolia^g systems can coarrlbate to equal qpporrcaity. 

Oa the saiae agenda, they cay be concerned vith recomeDding v hetner state- 

• * 

aj^rored readlag tertbooks sbouEd focas oa ''cospr^easjLpa rfrf??-? ^ or **vocab- 

tilary ^ills." Tne level of policy they s^e d^aliag vixh is very different ard 

yet tie statewide assessment program In reading is probably trying to assist 

'them vitb both decisons usiiig the ssiae test scores aad variailes analyzed and 

reported ia the san>e vay^ Hann sade this paint veil ishea he said: 

If ve are ..to have a to i mprove . • •we iiost also have a vay to 
eKilnde data, igaore variables, s^^ress Interactloas and foaxs 
atteatloa oa particclar pheaomeaa and ijgl a tloa sn lps tfhat 3re of 
interests Increasing the'r^.gor of the definitioa of- policy, and 
esta)lishiag its descriptlrc llrrlts, can coatribate to oar ability to 
li^nrove the policy dec±sioa--22aklng process (?• ^ 

y 

(2) Such programs seldom have the capacity to produce iafornatloa -sAdxh is 
*'issue" orleated at ^ tla^ vhea it is most needed by pollcyiial:ers^^. 



I 0 ^ 

The. majority of individuals responsible for large-scale t estin g programs vitn 
vhom 1 have talked feel it is ti^ir priisary task to prodtice technically sound 
data* 15iey psually issue tils in a report of some type soon after their aaaly- 
sis is cocplet^* They feel it is sosadxxiy else's respoaslblity to^t^sdne 
vhat policy Issues are suggested by the data^ • — 

A few directors assume they also have this iatter responsibility but oalj afl;er 
• * 

the data h^ been collected and analyzed^ I loobw of at least two statewide 
assessment programs that, use interpretation panels after the data is available 
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a 



sad ?rv^r.^rry tbelT teCT!iical r^rts with specific recoECDeadatioiis about the 

type of policy issaes vnich 3£i^ be addressed. Botb of these vtewpolnrs 

tx) lae to Igaore the realities of the educatiozdl policyiaakiiig prN^^e^ggpSd^a^ 

rajcr factor as 'to vhy each data has little iii5>act. "Again, turniag to the^ " 
« • 

literature on. pplicyisaiing, one finds genearal agreesieat that it is only at the 
pojbat that a problesi, deaiand or need becomes a publicly recognised "issue" liat 
if is liialy to fecel^sre any xype of polic y s ctiuu (£tii>tuii r"i^65; Joae:>» 
Trunin, 3562) . And it is at this point that decisoa-isafcers are laost receptive 
to c*j active analysis of data vhich bears on that topic. How much core effi- 
cient aod-^^tive our ^large-scale testing programs vould be if they could (1) 
identify the laost cccisequential "iss^^ 5^hlch their target audiences are 
liliely to be facing and design their programs to produce Infonsation specific 
to those xsy^is:B\ (2) inove avay froa the one-shot, once-a-year reporting syn- 
drone and instead build a capacity to deliver the infonsation -whea^t "is 
"tinely" to the poliqyraking needs of the target ^^di^l^es; and (3> devel<q> the 
capacity to provide quick^um-around analyses and ^tudies responsive^ to unan- 
ticipated demands which have sudd"enly risen to the "issue" stagfe. This" latter 
capacity is not all that unrealistic to feaq^ect irtien you consider all jo i^^the 
other data collection and-&alysis edacities within an organization which 
seldom, if ever, are coordinated with the student testing pMgraa. If sailing 
strategies, -geographic reporting boundaries and other such technical consideM- 
tions were carried out in a coordinated way within oost of our institutions, we 
could be using our student perfpnsance data, far xaore effectively for dealing 
with "issue-related" decisions on a nore timely 'and responsive schedule. 



' (3) Peg prograza take Into accotmt thte fact that this pollcyaa Hng process 
>^ Is characterized by "uncertainty" and also by "coopetlng" value systeos . 

-7-- 



To put Lt another vay, verj few large-scale testing programs nov have the capa- 
city to as-sist their policyrsakers identify and analyze the consequences of 
alternative decisions or to do follow-up research to answer questions prompted 
by the original assessment results. In nost of our prograiriS, we can answer the 
..Question of 'Vhat" the student perf orrsance is in a particular .area, but we 



certainly cannot answer the question of ^hy" a particular restilt occurred. We 

are, in fact, contributing to the "uncertainty" about an issue. Given scarce 

resources, ni^t it be lagre ef f ecti^ to concentrate on a few critical subject 

areas and release existing dollars for follow-i^ research to provide better 

t 

answer-s' about "H.-hy" soioe of the data turned out the way it did? H interpre- 

tation panels are going to be used to <ieten2lne j&e potential policy iiapact of 

assessia^t data, night it not be oore effective to have such groups prepare a 

... - / 

series of alternative recommendations which includes their best estlisate-of the 

educational, fiscal and political consequence of each""alternative? 



In summary, with the best of intentions and at great personal and fiscal cost, 
we have initiated large-scale testing programs for the purpose of producing 
useful information for target policymaking audiences. We have evidence that 
our efforts have missed the mark, ahd it is my belief that a contributing 
factor is our lack of understanding of and responsiveness to the practicalities 
and realities of the policy process itself. I have suggested at least three 
areas In which ir:5>roveaents should be made. Let me leave you with the fol- 
lowing pofei. It is supposedly anonymous, hut I think it actually was written 
by the director of a large-scale testing program. ^ 

If you run very hard 
' * With great effort and sfrain 
' You may clamber aboard 

Xhe last car, the wrong train. 
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